Posterior Approximations

This notebook outlines the strategies frankenz can use to approximate the photometric posterior and compares how well they do relative to brute-force approaches.

Setup


In [1]:
from __future__ import print_function, division
import sys
import pickle
import numpy as np
import scipy
import matplotlib
from matplotlib import pyplot as plt
from six.moves import range
import gc

# import frankenz code
import frankenz

# plot in-line within the notebook
%matplotlib inline

np.random.seed(2018)

In [2]:
# re-defining plotting defaults
from matplotlib import rcParams
rcParams.update({'xtick.major.pad': '7.0'})
rcParams.update({'xtick.major.size': '7.5'})
rcParams.update({'xtick.major.width': '1.5'})
rcParams.update({'xtick.minor.pad': '7.0'})
rcParams.update({'xtick.minor.size': '3.5'})
rcParams.update({'xtick.minor.width': '1.0'})
rcParams.update({'ytick.major.pad': '7.0'})
rcParams.update({'ytick.major.size': '7.5'})
rcParams.update({'ytick.major.width': '1.5'})
rcParams.update({'ytick.minor.pad': '7.0'})
rcParams.update({'ytick.minor.size': '3.5'})
rcParams.update({'ytick.minor.width': '1.0'})
rcParams.update({'axes.titlepad': '15.0'})
rcParams.update({'font.size': 30})

Data

Following the previous notebook, we will stick with our mock SDSS data.


In [3]:
survey = pickle.load(open('../data/mock_sdss_cww_bpz.pkl', 'rb'))  # load data
types = survey.data['types']  # type flag
templates = survey.data['templates']  # template ID
redshifts = survey.data['redshifts']  # redshift
mags = survey.data['refmags']  # magnitude (reference)
phot_obs = survey.data['phot_obs']  # observed photometry
phot_err = survey.data['phot_err']  # photometry error
phot_true = survey.data['phot_true']  # true photometry
Nobs = len(types)

Recap

We are interested in constructing an estimate for the population redshift distribution $N(z|\mathbf{g}) = \sum_g P(z|g)$ and individual redshift distributions $P(z|g)$ for a set of $N_\mathbf{g}$ observed (unlabeled) galaxies $g \in \mathbf{g}$ by projecting our results onto a relevant (possibly noisy) basis composed of $N_\mathbf{h}$ training (labeled) galaxies $h \in \mathbf{h}$ with $N_{\mathbf{h}}$. This constitutes a "big data" approximation which assumes that we can reduce a continuous process to a discrete set of comparisons.

Given a prior over our basis $P(h)$, we can then write the posterior between $h$ and $g$ using Bayes Theorem as

$$ P(h|g) = \frac{\mathcal{L}(g|h)\pi(h)}{\mathcal{Z}_g} = \frac{\mathcal{L}(g|h)\pi(h)}{\sum_h \mathcal{L}(g|h) \pi(h)} $$

where $\mathcal{Z}_g$ is the evidence (i.e. marginal likelihood) of $g$.

Each galaxy has a set of observed flux densities $\mathbf{F}$ with PDF $P(\hat{\mathbf{F}}|g)$ observed in a set of $N_{\mathbf{b}}$ photometric bands indexed by $b \in \mathbf{b}$. Our likelihood is

$$ \mathcal{L}(g|h) \equiv P(\mathbf{F}_g | \mathbf{F}_h) = \frac{\int P(\hat{\mathbf{F}}_g | \mathbf{F}) P(\hat{\mathbf{F}}_h | \mathbf{F}) \pi(\mathbf{F}) d\mathbf{F}}{\int P(\hat{\mathbf{F}}_h | \mathbf{F}) \pi(\mathbf{F}) d\mathbf{F}} $$

where $\pi(\mathbf{F})$ is an $N_{\mathbf{b}}$-dimensional prior over the true flux densities.

Given our labeled objects $\mathbf{h}$ with corresponding redshift kernels $K(z|h)$, we get

$$ P(z|g) = \sum_h K(z|h)P(h|g) = \frac{\sum_h K(z|h)\mathcal{L}(g|h)\pi(h)}{\sum_h \mathcal{L}(g|h)\pi(h)} $$

which corresponds to a posterior-weighted mixture of the $K(z|h)$ redshift kernels.

Implicitly Defining Priors from Samples

Writing $\pi(h)$ is terms of physical parameters characterizing the underlying process can be difficult. However, given a representative collection of labeled data whose distribution over the observables (and the labels) follows the same distribution(s) probed by the observed data, we can use the relative number density of samples to implicitly define a prior over the physical parameters of interest and safely set $P(h)=1$.

We re-illustrate this below using a random galaxy sampled from our mock data.


In [4]:
# sample good example object
idx = np.random.choice(np.arange(Nobs)[(mags < 22.5) & (mags > 22)])

In [5]:
# compute loglikelihoods (noiseless) 
ll, nb, chisq = frankenz.pdf.loglike(phot_obs[idx], phot_err[idx],
                                     np.ones(survey.NFILTER),
                                     phot_true, phot_err,
                                     np.ones_like(phot_true),
                                     free_scale=False, ignore_model_err=True,
                                     dim_prior=False)

In [6]:
# compute color loglikelihoods over grid
mphot = survey.models['data'].reshape(-1, survey.NFILTER)
merr = np.zeros_like(mphot)
mmask = np.ones_like(mphot)
llm, nbm, chisqm, sm, smerr = frankenz.pdf.loglike(phot_obs[idx], phot_err[idx], 
                                                   np.ones(survey.NFILTER),
                                                   mphot, merr, mmask,
                                                   dim_prior=False, free_scale=True, 
                                                   ignore_model_err=True, return_scale=True)

In [7]:
# compute prior
mzgrid = survey.models['zgrid']
prior = np.array([frankenz.priors.bpz_pz_tm(mzgrid, t, mags[idx]) 
                  for t in survey.TTYPE]).T.flatten()

In [8]:
# define plotting functions
try:
    from scipy.special import logsumexp
except ImportError:
    from scipy.misc import logsumexp

def plot_flux(phot_obs, phot_err, phot, logl, 
              ocolor='black', mcolor='blue', thresh=1e-1):
    """Plot SEDs."""

    wave = np.array([f['lambda_eff'] for f in survey.filters])
    wt = np.exp(logl)
    wtmax = wt.max()
    sel = np.arange(len(phot))[wt > thresh * wtmax]
    [plt.plot(wave, phot[i], alpha=wt[i]/wtmax*0.4, lw=3, 
              zorder=1, color=mcolor) for i in sel]
    plt.errorbar(wave, phot_obs, yerr=phot_err, lw=3, color=ocolor, zorder=2)
    plt.xlabel(r'Wavelength ($\AA$)')
    plt.xlim([wave.min() - 100, wave.max() + 100])
    plt.ylim([(phot_obs - phot_err).min() * 0.9, (phot_obs + phot_err).max() * 1.1])
    plt.ylabel(r'$F_\nu$')
    plt.yticks(fontsize=24)
    plt.tight_layout()
    
def plot_redshift(redshifts, logl, ztrue=None, color='yellow', 
                  tcolor='red'):
    """Plot redshift PDF."""
    
    n, _, _ = plt.hist(redshifts, bins=zgrid, weights=np.exp(logl), 
                       histtype='stepfilled', edgecolor='black',
                       lw=3, color=color, alpha=0.8)
    if ztrue is not None:
        plt.vlines(ztrue, 0., n.max() * 1.1, color=tcolor, linestyles='--', lw=2)
    plt.xlabel('Redshift')
    plt.ylabel('PDF')
    plt.xlim([zgrid[0], zgrid[-1]])
    plt.ylim([0., n.max() * 1.1])
    plt.yticks([])
    plt.tight_layout()

def plot_zt(redshifts, templates, logl, ztrue=None, ttrue=None,
            cmap='viridis', tcolor='red', thresh=1e-2):
    """Plot joint template-redshift PDF."""
    lsum = logsumexp(logl)
    wt = np.exp(logl - lsum)
    plt.hist2d(redshifts, templates, bins=[zgrid, tgrid],
               weights=wt, 
               cmin=thresh*max(wt),
               cmap=cmap)
    if ttrue is not None:
        plt.hlines(ttrue, zgrid.min(), zgrid.max(), 
                   color=tcolor, lw=2, linestyles='--')
    if ztrue is not None:
        plt.vlines(ztrue, tgrid.min(), tgrid.max(), 
                   color=tcolor, lw=2, linestyles='--')
    plt.xlabel('Redshift')
    plt.ylabel('Template')
    plt.xlim([zgrid[0], zgrid[-1]])
    plt.ylim([tgrid[0], tgrid[-1]])
    plt.tight_layout()

In [9]:
zgrid = np.arange(0., 4. + 0.1, 0.05)
tgrid = np.arange(survey.NTEMPLATE + 1) - 0.5

# plot flux distribution
plt.figure(figsize=(24, 15))
plt.subplot(3,3,1)
plot_flux(phot_obs[idx], phot_err[idx], sm[:, None] * mphot,
          llm, ocolor='black', mcolor='blue', thresh=0.5)
plt.title('Likelihood (grid)')
plt.subplot(3,3,2)
plot_flux(phot_obs[idx], phot_err[idx], sm[:, None] * mphot,
          llm + np.log(prior).flatten(), 
          ocolor='black', mcolor='red', thresh=0.5)
plt.title('Posterior (grid)')
plt.subplot(3,3,3)
plot_flux(phot_obs[idx], phot_err[idx], phot_true, ll, 
          ocolor='black', mcolor='blue', thresh=0.5)
plt.title('Mag Likelihood\n(noiseless samples)')

# plot redshift distribution
mredshifts = np.array([mzgrid for i in range(survey.NTEMPLATE)]).T.flatten()
plt.subplot(3,3,4)
plot_redshift(mredshifts, llm, ztrue=redshifts[idx])
plt.xticks(zgrid[::20])
plt.subplot(3,3,5)
plot_redshift(mredshifts, llm + np.log(prior).flatten(),
              ztrue=redshifts[idx])
plt.xticks(zgrid[::20])
plt.subplot(3,3,6)
plot_redshift(redshifts, ll, ztrue=redshifts[idx])
plt.xticks(zgrid[::20])

# plot redshift-type joint distribution
mtemplates = np.array([np.arange(survey.NTEMPLATE) 
                       for i in range(len(mzgrid))]).flatten()
plt.subplot(3,3,7)
plot_zt(mredshifts, mtemplates, llm, thresh=1e-2,
        ztrue=redshifts[idx], ttrue=templates[idx])
plt.xticks(zgrid[::20])
plt.yticks(tgrid[1:] - 0.5)
plt.subplot(3,3,8)
plot_zt(mredshifts, mtemplates, llm + np.log(prior).flatten(),
        thresh=1e-2, ztrue=redshifts[idx], ttrue=templates[idx])
plt.xticks(zgrid[::20])
plt.yticks(tgrid[1:] - 0.5)
plt.subplot(3,3,9)
plot_zt(redshifts, templates, ll,
        ztrue=redshifts[idx], ttrue=templates[idx])
plt.xticks(zgrid[::20])
plt.yticks(tgrid[1:] - 0.5);


/home/joshspeagle/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:12: RuntimeWarning: divide by zero encountered in log
  if sys.path[0] == '':
/home/joshspeagle/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:26: RuntimeWarning: divide by zero encountered in log
/home/joshspeagle/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:42: RuntimeWarning: divide by zero encountered in log

Again, in practice we do not often have access to a fully representative training sample. We will return to methods to derive $P(h)$ later.

Dealing with Large Datasets

For most applications, we often will be dealing with a large sample of objects. We'll define a particular sample below following the previous notebook.


In [10]:
# training/testing set divide
sel = (phot_obs / phot_err)[:, survey.ref_filter] > 5.  # S/N > 5 cut
Nsel = sel.sum()

Ntrain, Ntest = 60000, 5000
train_sel = np.arange(Nobs)[sel][:Ntrain]  # training set
test_sel = np.arange(Nobs)[sel][Ntrain:Ntrain+Ntest]  # testing set
Nmodel = len(mphot)

print('Number of observed galaxies (all):', Nobs)
print('Number of observed galaxies (selected):', Nsel)
print('Number of models:', Nmodel)
print('Number of training galaxies:', Ntrain)
print('Number of testing galaxies:', Ntest)


Number of observed galaxies (all): 200000
Number of observed galaxies (selected): 67951
Number of models: 8000
Number of training galaxies: 60000
Number of testing galaxies: 5000

In [11]:
# initialize datasets
phot_train, phot_test = phot_obs[train_sel], phot_obs[test_sel]
err_train, err_test = phot_err[train_sel], phot_err[test_sel]
mask_train, mask_test = np.ones_like(phot_train), np.ones_like(phot_test)

In [12]:
# initialize asinh magnitudes ("Luptitudes")
flux_zeropoint = 10**(-0.4 * -23.9) # AB magnitude zeropoint
fdepths = np.array([f['depth_flux1sig'] for f in survey.filters])
mag, magerr = frankenz.pdf.luptitude(phot_obs, phot_err, skynoise=fdepths,
                                     zeropoints=flux_zeropoint)

# initialize magnitude dictionary 
mdict = frankenz.pdf.PDFDict(pdf_grid=np.arange(-20., 60., 5e-3), 
                             sigma_grid=np.linspace(0.01, 5., 500))

# initialize redshift dictionary
rdict = frankenz.pdf.PDFDict(pdf_grid=np.arange(0., 7.+1e-5, 0.01), 
                             sigma_grid=np.linspace(0.005, 2., 500))

# true distribution
rsmooth = 0.05
zpdf0 = frankenz.pdf.gauss_kde_dict(rdict, y=redshifts[test_sel],
                                    y_std=np.ones_like(test_sel) * rsmooth)

In [13]:
# plotting redshift distribution
plt.figure(figsize=(14, 6))

# all
zerr_t = np.ones_like(redshifts) * rsmooth
z_pdf = frankenz.pdf.gauss_kde_dict(rdict, y=redshifts,
                                    y_std=zerr_t)
plt.plot(rdict.grid, z_pdf / z_pdf.sum(), lw=5, color='black')
plt.fill_between(rdict.grid, z_pdf / z_pdf.sum(), color='gray',
                 alpha=0.4, label='Underlying')

# selected
zsel_pdf = frankenz.pdf.gauss_kde_dict(rdict, y=redshifts[sel],
                                       y_std=zerr_t[sel])
plt.plot(rdict.grid, zsel_pdf / zsel_pdf.sum(), lw=5, color='navy')
plt.fill_between(rdict.grid, zsel_pdf / zsel_pdf.sum(), 
                 color='blue', alpha=0.4, label='Observed')

# prettify
plt.xlim([0, 4])
plt.ylim([0, None])
plt.yticks([])
plt.legend(fontsize=20)
plt.xlabel('Redshift')
plt.ylabel('$N(z|\mathbf{g})$')
plt.tight_layout()


Method 1: Brute Force (BruteForce)

The simplest approach is the brute force approach where we attempt to fit all our models to the data. Brute force approaches can often be surprising fast because they can exploit quick matrix operations over large arrays. However, all brute force approaches scale as $\mathcal{O}(N_\mathbf{h})$ both in time and in memory usage and so quickly become infeasible when dealing with large training datasets/model grids.

We'll initialize two solvers: one based on fitting the entire training set (60k objects) in magnitude space and another based on fitting our underlying model grid (8k objects) in color space.


In [14]:
from frankenz.fitting import BruteForce

# initialize objects
model_BF = BruteForce(mphot, merr, mmask)  # model grid
train_BF = BruteForce(phot_train, err_train, mask_train)  # training data

In [15]:
# define bpz log(prob) function
def lprob_bpz(x, xe, xm, ys, yes, yms, 
              mzgrid=None, ttypes=None, ref=None):
    results = frankenz.pdf.loglike(x, xe, xm, ys, yes, yms,
                                   ignore_model_err=True,
                                   dim_prior=False,
                                   free_scale=True)
    lnlike, ndim, chi2 = results
    mag = -2.5 * np.log10(x[ref]) + 23.9
    prior = np.array([frankenz.priors.bpz_pz_tm(mzgrid, t, mag)
                      for t in ttypes]).T.flatten()
    lnprior = np.log(prior)
    return lnprior, lnlike, lnlike + lnprior, ndim, chi2

In [16]:
rsmooth = 0.01

# fit data
model_BF.fit(phot_test, err_test, mask_test, lprob_bpz, 
             lprob_args=[mzgrid, survey.TTYPE, survey.ref_filter])

# compute posterior-weighted redshift PDFs
mredshifts = np.array([mzgrid for i in range(survey.NTEMPLATE)]).T.flatten()
pdfs_post, gofs = model_BF.predict(mredshifts, np.ones_like(mredshifts) * rsmooth,
                                   label_dict=rdict, return_gof=True)
lmap_post, levid_post = gofs


/home/joshspeagle/anaconda3/lib/python3.6/site-packages/ipykernel_launcher.py:12: RuntimeWarning: divide by zero encountered in log
  if sys.path[0] == '':
Fitting object 5000/5000
Generating PDF 5000/5000

In [17]:
# compute likelihood-weighted redshift PDFs
pdfs_like, gofs = model_BF.predict(mredshifts, np.ones_like(mredshifts) * rsmooth,
                                   label_dict=rdict, return_gof=True, logwt=model_BF.fit_lnlike)
lmap_like, levid_like = gofs


Generating PDF 5000/5000

Now we'll generate predictions using our training (labeled) data.


In [18]:
# compute PDFs from training set using brute force
pdfs_bf, gofs = train_BF.fit_predict(phot_test, err_test, mask_test,
                                     redshifts[train_sel], 
                                     np.ones_like(train_sel) * rsmooth,
                                     label_dict=rdict, save_fits=False,
                                     return_gof=True)
lmap_bf, levid_bf = gofs


Generating PDF 5000/5000

In [19]:
def plot_zstack(data, labels, colors):
    
    Ndata = len(data)
    ncols = 4
    nrows = Ndata // ncols + 1
    
    # plotting
    fig = plt.figure(figsize=(10*ncols, 5*nrows))
    axes = []
    axes.append(plt.subplot(nrows, ncols, 1))
    plt.plot(rdict.grid, zpdf0, lw=5, color='black',
             label='Underlying')
    plt.plot(rdict.grid, pdfs_like.sum(axis=0), 
             lw=5, color='gray', alpha=0.75, 
             label='BPZ Color Likelihood (grid)')
    for i in range(1, Ndata+1):
        axes.append(plt.subplot(nrows, ncols, i+1))
        plt.plot(rdict.grid, zpdf0, lw=5, color='black', zorder=-1)
        plt.plot(rdict.grid, data[i-1].sum(axis=0), 
                 lw=5, color=colors[i-1], alpha=0.75,
                 label=labels[i-1])
    for i in range(Ndata+1):
        axes[i].set_xlim([0., 3.])
        axes[i].set_ylim([0., None])
        axes[i].set_yticks([])
        axes[i].legend(fontsize=20)
        axes[i].set_xlabel('Redshift')
        axes[i].set_ylabel('$N(z|\mathbf{g})$')
    fig.tight_layout()
    
# plot stacked PDFs
data = [pdfs_post, pdfs_bf]
labels = ['BPZ Color Posterior (grid)', 'Samples (BruteForce)']
colors = ['red', 'blue']
plot_zstack(data, labels, colors)


As before, the population redshift distribution $N(z|\mathbf{g})$ computed from our noisy flux densities is very close to that computed by the (approximate) BPZ posterior. Let's also take a look at the 2-D redshift distributions along with the quality of the PDFs.


In [20]:
from frankenz import plotting as fzplot

def plot_zstack2d(data, labels):
    
    Ndata = len(data)
    ncols = 4
    nrows = Ndata // ncols + 1
    
    # plotting
    fig = plt.figure(figsize=(12*ncols, 10*nrows))
    axes = []
    axes.append(plt.subplot(nrows, ncols, 1))
    hh = fzplot.input_vs_pdf(redshifts[test_sel], np.zeros_like(test_sel), 
                             rdict, pdfs_like, rdict.grid,
                             plot_thresh=0.15, smooth=1.5)
    plt.title('BPZ Color Likelihood (grid)')
    for i in range(1, Ndata+1):
        axes.append(plt.subplot(nrows, ncols, i+1))
        hh = fzplot.input_vs_pdf(redshifts[test_sel], 
                                 np.zeros_like(test_sel), 
                                 rdict, data[i-1], rdict.grid,
                                 plot_thresh=0.15, smooth=1.5)
        plt.title(labels[i-1])
    for i in range(Ndata+1):
        axes[i].plot([0, 100], [0, 100], 'k--', lw=3)
        axes[i].set_xlabel('Redshift (true)')
        axes[i].set_ylabel('Stacked P(z)')
        axes[i].set_xlim([0., 1.5])
        axes[i].set_ylim([0., 1.5])
    fig.tight_layout()

# plot 2-d stacked pdfs
plot_zstack2d(data, labels)



In [21]:
def cdf_plot(pdfs, levid, lmap, pct):
    csel = ((levid > np.sort(levid)[int(pct * Ntest)]) &
            (lmap > np.sort(lmap)[int(pct * Ntest)])) # remove bad fits
    fzplot.cdf_vs_ecdf(redshifts[test_sel], np.zeros_like(test_sel),
                       pdfs, rdict.grid, 
                       plot_kwargs={'color': 'blue', 'alpha': 0.8, 'lw': 3})
    fzplot.cdf_vs_ecdf(redshifts[test_sel], np.zeros_like(test_sel),
                       pdfs, rdict.grid, weights=csel,
                       plot_kwargs={'color': 'red', 'alpha': 0.8, 'lw': 3})
    plt.plot([0, 1], [0, 1], lw=3, color='black', ls='--')

def plot_cdfs(data, levids, lmaps, labels, pct):
    
    Ndata = len(data)
    ncols = 4
    nrows = Ndata // ncols + 1
    
    fig = plt.figure(figsize=(8*ncols, 7*nrows))
    plt.subplot(nrows, ncols, 1)
    cdf_plot(pdfs_like, levid_like, lmap_like, pct)
    plt.legend(['All objects', 'Bad fits removed', 'Uniform'], 
               loc='best', fontsize=18)
    plt.title('BPZ Color Likelihood (grid)', fontsize=28)
    plt.tight_layout()
    for i in range(1, Ndata+1):
        plt.subplot(nrows, ncols, i+1)
        cdf_plot(data[i-1], levids[i-1], lmaps[i-1], pct)
        plt.title(labels[i-1], fontsize=28)
        plt.tight_layout()

levids = [levid_post, levid_bf]
lmaps = [lmap_post, lmap_bf]

# plot results
plot_cdfs(data, levids, lmaps, labels, 0.05)



In [22]:
# clear memory
del train_BF
gc.collect()


Out[22]:
18246

Method 2: Nearest Neighbors (NearestNeighbors)

We've now demonstrated that:

  1. The original BPZ posterior is an unbiased probe of the underlying distribution (as expected, since we constructed our mock data from the same priors).
  2. Our magnitude-based likelihoods computed over unbiased training data provides is an unbiased (but noisier) probe of the underlying distribution (as expected).

We now turn to the computational challenge of exploiting this training data. In particular, taking a brute-force approach (as we did above) is prohibitively slow. It also penalizes you harshly for expanding your training sample, which seems counter-productive.

To solve this issue (somewhat foreshadowed in our KDE discussion above), we turn to "machine learning". Specifically, we want to use machine learning to give us a sparse approximation of the true likelihood distribution using a small but unbiased set of neighbors. In other words, we want to select a small subset of objects ($N_{\mathrm{neighbor}} \ll N_{\mathrm{train}}$) and use the likelihoods evaluated over only those objects as an approximation of the overall likelihood.

Unfortunately, finding a subset of nearest neighbors in likelihood-space is not straightforward since the likelihood depends on the pairwise combined errors of any two objects $g$ and $h$. One way that frankenz tries to get around this problem is by supplementing $k$ nearest-neighbor searches using Monte Carlo methods: by re-doing our $k$ nearest-neighbor search over $K$ Monte Carlo realizations of $\mathbf{g}$ and $\mathbf{h}$ and taking the union of all neighbors, we can generate a consistent estimate of the set of "true" nearest neighbors. We refer to this as KMCkNN for short.


In [23]:
from frankenz.fitting import NearestNeighbors

flux_zeropoint = 10**(-0.4 * -23.9) # AB magnitude zeropoint
fdepths = np.array([f['depth_flux1sig'] for f in survey.filters])  # 5-sigma depths

# initialize NN object
train_NN = NearestNeighbors(phot_train, err_train, mask_train,
                            feature_map='luptitude',
                            fmap_args=[fdepths, flux_zeropoint])


25/25 KDTrees constructed

In [24]:
# fit and generate PDFs
pdfs_nn, gofs = train_NN.fit_predict(phot_test, err_test, mask_test,
                                     redshifts[train_sel], 
                                     np.ones_like(train_sel) * rsmooth,
                                     label_dict=rdict, return_gof=True,
                                     save_fits=False)
lmap_nn, levid_nn = gofs


Generating PDF 5000/5000

In [25]:
# plot stacked PDFs
data.append(pdfs_nn)
labels.append('Samples (KMCkNN)')
colors.append('darkviolet')
plot_zstack(data, labels, colors)



In [26]:
# plot 2-D stacked PDFs
plot_zstack2d(data, labels)



In [27]:
# plot CDF tests
levids.append(levid_nn)
lmaps.append(lmap_nn)
plot_cdfs(data, levids, lmaps, labels, 0.05)


Our results look pretty decent, so that's reassuring.


In [28]:
# clear memory
del train_NN
gc.collect()


Out[28]:
24049

Method 3: Self-Organizing Maps (SelfOrganizingMap)

One issue with the KMCkNN method outlined above is that the maximum number of neighbors is limited to $K \times k$ for any particular object. This is not a problem for objects at high/moderate signal-to-noise, where the number of relevant neighbors is often not that large. This does lead to a problem for lower-S/N data, since their larger errors mean that they are consistent with larger portions of the training data. Ideally, we'd be able to select all objects within the relevant $X$-$\sigma$ error bounds so that the number of neighbors scales with the S/N.

frankenz tries to accomplish this by trying to find an optimal set of $M_{\mathbf{h}} \ll N_{\mathbf{h}}$ "effective SEDs" (Spectral Energy Distributions) that can mimic the distribution of observables in $\mathbf{h}$. These then form a network (comprised of SED nodes) across the set of observables. Once this network has been constructed, each model in $\mathbf{h}$ is then mapped back onto the network, so that a particular node (effective SED) becomes associated with a set of closely-matching objects in $\mathbf{h}$.

One approach to constructing such a network is a Self-Organizing Map (SOM), which aims to construct a network that is "topologically smooth" in some reduced $M_{\mathbf{b}} < N_{\mathbf{b}}$ dimensional space. This has the added benefit of providing a $M_{\mathbf{b}}$-dimensional projection of the $N_{\mathbf{b}}$-dimensional data, albeit at that cost of assuming "smoothness" in higher-dimensional SED space.

While the geometry of the SOM is flexible in theory, frankenz currently only allows for a hyper-cube-style geometries with fixed side lengths (default: 50x50). Since it is actually quite difficult to ensure "good" behavior when training networks directly on photometric flux densities due to the logarithmic number densities and scalings involved, by default the networks in frankenz train on "colors" (i.e. flux density ratios), which tries to optimize out these scale differences.


In [29]:
from frankenz.fitting import SelfOrganizingMap

train_SOM = SelfOrganizingMap(phot_train, err_train, mask_train)
train_SOM.train_network(err_kernel=0.01*phot_train)  # 1% smoothing
train_SOM.populate_network()


Iteration 2000/2000 [learn= 0.100, sigma= 1.000]     
Mapping objects 100%

We can quickly check the number density of objects over the SOM to see how well it is learning structure in the data.


In [30]:
# plot number density
plt.figure(figsize=(24, 9))
plt.subplot(1, 2, 1)
fzplot.plot2d_network(train_SOM, counts='absolute', plot_kwargs={'s': 150})
plt.tight_layout()
plt.subplot(1, 2, 2)
fzplot.plot2d_network(train_SOM, counts='weighted', plot_kwargs={'s': 150})
plt.tight_layout()


We see that the number density of objects is relatively spread out over the map, implying it has efficiently adapted to the distribution of objects. The "streaks" we see in terms of number density are real features in the data that arise from our limited set of templates: since the SOM assumes the underlying data space is smooth, it tries to interpolate between them by creating a small set of "unphysical" models.

We can confirm this by looking at the relative magnitude distribution and colors over the map.


In [31]:
# plot mag and color
plt.figure(figsize=(24, 9))
plt.subplot(1, 2, 1)
r_som = fzplot.plot2d_network(train_SOM, 
                              labels=mag[train_sel, survey.ref_filter],
                              labels_err=magerr[train_sel, survey.ref_filter],
                              label_name='r mag',
                              plot_kwargs={'s': 150})
plt.tight_layout()
plt.subplot(1, 2, 2)
ri_som = fzplot.plot2d_network(train_SOM, 
                              labels=(mag[train_sel, survey.ref_filter] - 
                                      mag[train_sel, survey.ref_filter+1]),
                              labels_err=np.sqrt(magerr[train_sel, survey.ref_filter]**2 +
                                                 magerr[train_sel, survey.ref_filter+1]**2),
                              label_name='r-i color',
                              plot_kwargs={'s': 150})
plt.tight_layout()


Computing r mag estimate 2500/2500
Computing r-i color estimate 2500/2500

While our map varies smoothly in color, only objects with very large errors are being assigned to the "streaks", confirming that these are indeed real features in the data.

finally, using plot_node we can also check what some of the SEDs at any particular nodes looks like.


In [32]:
# plot a random node
plt.figure(figsize=(12, 6))
fwave = [f['lambda_eff'] for f in survey.filters]
fzplot.plot_node(train_SOM, phot_train, err_train, idx=1875, models_x=fwave)
plt.xlabel('Wavelength [A]')
plt.ylabel('Flux Density')
plt.title('Node {0}'.format(train_SOM.nodes_pos[1875]))
plt.tight_layout()


Node-based (1-Level) Inference

We can use the network to approximate the PDF by using a "one-level" scheme where we just fit the nodes $\mathbf{i}$ and the associated models $\lbrace \mathbf{F}_i \rbrace$ and then stack the PDFs for the models associated with each node, i.e.

$$ P(z|g) \propto \sum_i \sum_h P(z|h) P(i|h) P(i|g) = \sum_i N(z|i) P(i|g)$$

We can access this approximation in frankenz by using the nodes_only=True flag when computing fits.


In [33]:
# use PDFs computed for each node
pdfs_som_approx, gofs = train_SOM.fit_predict(phot_test, err_test, mask_test,
                                              redshifts[train_sel], 
                                              np.ones_like(train_sel) * rsmooth,
                                              label_dict=rdict, return_gof=True,
                                              nodes_only=True, save_fits=False)
lmap_som_approx, levid_som_approx = gofs


Generating node PDF 2500/2500
Generating PDF 5000/5000

Model-based (2-Level) Inference

The SOM can also be used in a "two-level" scheme to perform more "exact" inference using the underlying set of models. This involves (1) fitting the nodes of the network (SOM) to each object and then (2) fitting the models associated with each node. This essentially tries to use the SOM as a way to accelerate our search over the set of models $\mathbf{h}$.

Our original SOM, however, which utilizes probabilistic mappings, is way too diffuse to really speed up photo-z predictions given the effective signal-to-noise of a good chunk of the sample. We can cut down on the number of models we try to fit by using the discrete=True argument, which truncates model associations to a single cell rather than using all probabilistic associations mapped onto the SOM.


In [34]:
# only use objects associated with the best-fitting node
pdfs_som, gofs = train_SOM.fit_predict(phot_test, err_test, mask_test,
                                       redshifts[train_sel], 
                                       np.ones_like(train_sel) * rsmooth,
                                       label_dict=rdict, return_gof=True,
                                       discrete=True, save_fits=False)
lmap_som, levid_som = gofs


Generating PDF 5000/5000

Let's see how both our approximations do.


In [35]:
# plot stacked PDFs
data.append(pdfs_som)
labels.append('Samples (SOM; discrete)')
colors.append('seagreen')
data.append(pdfs_som_approx)
labels.append('Samples (SOM; nodes only)')
colors.append('darkgreen')
plot_zstack(data, labels, colors)



In [36]:
# plot 2-D stacked PDFs
plot_zstack2d(data, labels)



In [37]:
# plot CDF tests
levids.append(levid_som)
lmaps.append(lmap_som)
levids.append(levid_som_approx)
lmaps.append(lmap_som_approx)
plot_cdfs(data, levids, lmaps, labels, 0.05)


The results look very similar to those from our KMCkNN estimator for this sample, showing that our SOM-based approach appears to work well. As expected, our 1-level approximation is less effective than the 2-level approach.


In [38]:
# clear memory
del train_SOM
gc.collect()


Out[38]:
36249

Method 4: Growing Neural Gas (GrowingNeuralGas)

Two common issues that arise when training the SOM is that the topology (via the neighborhood function) and shape of the network must be specified ahead of time. While it is possible to use, e.g., the Bayesian evidence across the network to optimize the number of nodes (we will return to this later), ideally we would like to have a network that can both grow and adapt to local structure (and possible breakages) in the data.

This can be accomplished by trying to add structure to our network by transforming it into a graph, where edges between nodes define our topology (i.e. what constitutes a "neighbor" in our higher-dimensional space). frankenz implements one version of this type of algorithm known as a Growing Neural Gas (GNG). The GNG does three things:

  1. As training progresses, it adds nodes over time to try and better fit the data.
  2. Based on the results of the training at a given iteration, it forms (and rejuvenates) connections between nodes to establish topology.
  3. Over time, it also prunes old connections to keep the overall structure (relatively) sparse and ensure disconnected sub-networks can form.

While training a GNG is substantially more intensive than training a SOM, the resulting networks are often more efficiently allocated across the data space (i.e. it better avoids creating "unphysical" models to interpolate between gaps in the data).


In [39]:
from frankenz.fitting import GrowingNeuralGas

train_GNG = GrowingNeuralGas(phot_train, err_train, mask_train)
train_GNG.train_network(err_kernel=0.01*phot_train)
train_GNG.populate_network()


Iteration 5000/5000 [nodes=2500, edges pruned=18] 
Mapping objects 100%

Since the GNG doesn't impose any direct topological requirements (as opposed to the SOM), we have to do some additional manifold learning to deproject the network. This can be done using scikit-learn's manifold module. We found Spectral Embedding appears to work reasonably well when applied to the flux ratios, so that's what we'll use here.


In [40]:
from sklearn import manifold
SE = manifold.SpectralEmbedding()
train_GNG.nodes_pos = SE.fit_transform(train_GNG.nodes / 
                                       train_GNG.nodes[:, survey.ref_filter, None])

In [41]:
# plot a random node
plt.figure(figsize=(12, 6))
fzplot.plot_node(train_GNG, phot_train, err_train, idx=1875, models_x=fwave)
plt.xlabel('Wavelength [A]')
plt.ylabel('Flux Density')
plt.title('Node {0}'.format(1875))
plt.tight_layout()



In [42]:
# plot number density
plt.figure(figsize=(24, 9))
plt.subplot(1, 2, 1)
fzplot.plot2d_network(train_GNG, counts='absolute', plot_kwargs={'s': 150})
plt.tight_layout()
plt.subplot(1, 2, 2)
fzplot.plot2d_network(train_GNG, counts='weighted', plot_kwargs={'s': 150})
plt.tight_layout()



In [43]:
# plot mag and color
plt.figure(figsize=(24, 9))
plt.subplot(1, 2, 1)
r_gng = fzplot.plot2d_network(train_GNG, 
                              labels=mag[train_sel, survey.ref_filter],
                              labels_err=magerr[train_sel, survey.ref_filter],
                              label_name='r mag',
                              plot_kwargs={'s': 150})
plt.tight_layout()
plt.subplot(1, 2, 2)
ri_gng = fzplot.plot2d_network(train_GNG, 
                              labels=(mag[train_sel, survey.ref_filter] - 
                                      mag[train_sel, survey.ref_filter+1]),
                              labels_err=np.sqrt(magerr[train_sel, survey.ref_filter]**2 +
                                                 magerr[train_sel, survey.ref_filter+1]**2),
                              label_name='r-i color',
                              plot_kwargs={'s': 150})
plt.tight_layout()


Computing r mag estimate 2500/2500
Computing r-i color estimate 2500/2500

As with the SOM, our GNG has effectively learned the topology of our training data. Now let's compute the associated approximate and "exact" redshift PDFs.


In [44]:
# generate approximate PDFs from the GNG
pdfs_gng_approx, gofs = train_GNG.fit_predict(phot_test, err_test, mask_test,
                                              redshifts[train_sel], 
                                              np.ones_like(train_sel) * rsmooth,
                                              nodes_only=True, label_dict=rdict, 
                                              return_gof=True, save_fits=False)
lmap_gng_approx, levid_gng_approx = gofs


Generating node PDF 2500/2500
Generating PDF 5000/5000

In [45]:
# generate "exact" PDFs from the GNG
pdfs_gng, gofs = train_GNG.fit_predict(phot_test, err_test, mask_test,
                                       redshifts[train_sel], 
                                       np.ones_like(train_sel) * rsmooth,
                                       label_dict=rdict, return_gof=True,
                                       discrete=True, save_fits=False)
lmap_gng, levid_gng = gofs


Generating PDF 5000/5000

As before, let's see how both our approximations do.


In [46]:
# plot stacked PDFs
data.append(pdfs_gng)
labels.append('Samples (GNG; discrete)')
colors.append('firebrick')
data.append(pdfs_gng_approx)
labels.append('Samples (GNG; nodes only)')
colors.append('indianred')
plot_zstack(data, labels, colors)



In [47]:
# plot 2-D stacked PDFs
plot_zstack2d(data, labels)



In [48]:
# plot CDF tests
levids.append(levid_gng)
lmaps.append(lmap_gng)
levids.append(levid_gng_approx)
lmaps.append(lmap_gng_approx)
plot_cdfs(data, levids, lmaps, labels, 0.05)


The results for the GNG look very similar to those from our SOM-based estimators for this sample.


In [49]:
# clear memory
del train_GNG
gc.collect()


Out[49]:
19955